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Abstract 


This paper studies a novel discriminative part-based model to represent and rec¬ 
ognize object shapes with an “And-Or graph”. We define this model consist¬ 
ing of three layers: the leaf-nodes with collaborative edges for localizing local 
parts, the or-nodes specifying the switch of leaf-nodes, and the root-node encod¬ 
ing the global verification. A discriminative learning algorithm, extended from 
the CCCR [23], is proposed to train the model in a dynamical manner: the model 
structure (e.g., the configuration of the leaf-nodes associated with the or-nodes) is 
automatically determined with optimizing the multi-layer parameters during the 
iteration. The advantages of our method are two-fold, (i) The And-Or graph 
model enables us to handle well large intra-class variance and background clutters 
for object shape detection from images, (ii) The proposed learning algorithm is 
able to obtain the And-Or graph representation without requiring elaborate super¬ 
vision and initialization. We validate the proposed method on several challenging 
databases (e.g., INRIA-Horse, ETHZ-Shape, and UIUC-Reople), and it outper¬ 
forms the state-of-the-arts approaches. 

1 Introduction 

Rart-based and hierarchical representations have been widely studied in computer vision, and lead 
to some elegant frameworks for complex object detection and recognition. However, most of the 
methods address only the hierarchical decomposition by tree-structure models [5, 25], and oversim¬ 
plify the reconfigurability (i.e. structural switch) in hierarchy, which is the key to handle the large 
intra-class variance in object detection. In addition, the interactions of parts are often omitted in 
learning and detection. And-Or graph models are recently explored in [26, 27] to hierarchically 
model object categories via “and-nodes” and “or-nodes” that represent, respectively, compositions 
of parts and structural variation of parts. Their main limitation is that the learning process is strongly 
supervised and the model structure needs to be manually annotated. 

The key contribution of this work is a novel And-Or graph model, whose parameters and structure 
can be jointly learned in a weakly supervised manner. We achieve the superior performance on the 
task of detecting and localizing shapes from cluttered backgrounds, compared to the state-of-the- 
art approaches. As Fig. 3(a) illustrates, the proposed And-Or graph model consists of three layers 
described as follows. 

The leaf-nodes in the bottom layer represent a batch of local classifiers of contour fragments. We 
provide a partial matching scheme that can recognize the accurate part of the contour, to deal with 
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the problem that the true contours of objects are often connected to background clutters due to 
unreliable edge extraction. 

The or-nodes in the middle layer are “switch” variables specifying the activation of their children 
leaf-nodes. We utilize the or-nodes accounting for alternate ways of composition, rather than just 
defining multi-layer compositional detectors, which is shown to better handle the intra-class variance 
and inconsistency caused by unreliable edge detection. Each or-node is used to select one contour 
from the candidates detected via the associated leaf-nodes in the bottom layer. Moreover, during 
detection, location displacement is allowed for each or-node to tackle the part deformation. 

The root-node (i.e. the and-node) in the top layer is a global classifier capturing the holistic defor¬ 
mation of the object. The contours selected via the or-nodes are further verified as a whole, in order 
to make the detection robust against the background clutters. 

The collaborative edges between leaf-nodes are defined by the probabilistic co-occurrence of local 
classifiers, which relax the conditional independence assumption commonly used in previous tree 
structure models. Concretely, our model allows nearby contours to interact with each other. 

The key problem of training our And-Or graph model is automatic structure determination. We 
propose a novel learning algorithm, namely dynamic CCCP , extended from the concave-convex 
procedure (CCCP) [23, 22] by embedding the structural reconfiguration. It iterates to dynamically 
determine the production of leaf-nodes associated with the or-nodes, which is often simplified by 
manually fixing in previous methods [25, 16]. The other structure attributes (e.g., the layout of 
or-nodes and the activation of leaf-nodes) are implicitly inferred with the latent variables. 

2 Related Work 

Remarkable progress has been made in shape-based object detection [ 6 , 10, 9, 11, 19]. By em¬ 
ploying some shape descriptors and matching schemes, many works represent and recognize object 
shapes as a loose collection of local contours. For example, Ferrari et al. [ 6 ] used a codebook of 
PAS (pairwise adjacent segments) to localize object of interest; Maji et al. [11] proposed a maximum 
margin hough voting for hypothesis regions combining with intersection kernel SVM(IKSVM) for 
verification; Yang and Fatecki [19] constructed shape models in a fully connected graph form with 
partially-supervised learning, and detected objects via a Particle Filters (PF) framework. 

Recently, the tree structure latent models [25, 5] have provided significant improvements on object 
detection. Based on these methods, Srinivasan et al. [16] trained the descriptive contour-based de¬ 
tector by using the latent-SVM learning; Song et al. [15] integrated the context information with the 
learning, namely Context-SVM. Schnitzspan et al. [14] further combined the latent discriminative 
learning with conditional random fields using multiple features. 

Knowledge representation with And-Or graph was first introduced for modeling visual patterns by 
Zhu and Mumford [27]. Its general idea, i.e. using configurable graph structures with And, Or 
nodes, has been applied in object and scene parsing [26, 18, 24] and action classification [20]. 

3 And-Or Graph Representation for Object Shape 

The And-Or Graph model is defined as ^ = (V, f), where V represents three types of nodes and 
£ the graph edges. As Fig. 3(a) illustrates, the square on the top is the root-node representing 
the complete object instances. The dashed circles derived from the root are 2 ; or-nodes arranged 
in a layout of hi x 62 blocks, representing the object parts. Each or-node comprises an unfixed 
number of leaf-nodes (denoted by the solid circles on the bottom); the leaf-nodes are allowed to be 
dynamically created and removed during the learning. For simplicity, we set the maximum number 
m of leaf-nodes affiliated to one or-node, and the parameters of non-existing leaf-nodes to zero. 
Then the maximum number of all nodes in the model isl+n = 1 z z x m. We use i = 0 
indexing the root node, i = 1,..., z the or-nodes and j = z + 1,..., n the leaf-nodes. We also define 
that j G ch{i) indexes the child nodes of node i. The horizontal graph edges (i.e., collaborative 
edges) are defined between the leaf-nodes that are associated with different or-nodes, in order to 
encode the compatibility of object parts. The definitions of Q are presented as follows. 

Leaf-node: Each leaf-node Lj^j = 2 : -h 1,n is a local classifier of contours, whose placement is 
decided by its parent or-node (the localized block). Suppose a contour fragment c on the edge map 
X is captured by the block located at pi = (pf, pf), as the input of classifier. We denote (j)^ {pi, c) as 
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the feature vector using the Shape Context descriptor [3]. For any classifier, only the part of c fallen 
into the block will be taken into account, and we set (j)^ {pi , c) = 0 if c is entirely out. The response 
of classifier Lj at location pi of the edge map X is defined as: 

T^LAX,Pi) = maxw' • (t>\pi,c), (1) 

cGX 

where ccj is a parameter vector, which is set to zero if the corresponding leaf-node Lj is nonexistent. 
Then we can detect the contour from edge map X via the classifier, Cj = argmaxcex^j * c). 

Or-node: Each or-node Ui,i = 1,..., 2 ; is proposed to specify a proper contour from a set of candi¬ 
dates detected via its children leaf-nodes. Note that we can also consider the or-node activating one 
leaf-node. The or-nodes are allowed to perturb slightly with respect to the root. For each or-node 
Ui, we define the deformation feature as (p^ipo^Pi) = (dx, dy, dy‘^), where {dx, dy) is the dis¬ 
placement of the or-node position pi to the expected position po determined by the root-node. Then 
the cost of locating at is: 

Costi{po,Pi) = -u- ■ (j)^{po,Pi), (2) 

where w| is a 4-dimensional parameter vector corresponding to (j)^{po,Pi). In our method, each or- 
node contains at most m leaf-nodes, among which one is to be activated during inference. For each 
leaf-node Lj associated with Ui,wQ introduce an indicator variable Vj G {0,1} representing whether 
it is activated or not. Then we derive the auxiliary “switch” vector for Ui,\i = {vj^ , Vj^, ...^Vj^), 
where | |v^| | = 1. Thus, the response of the or-node Ui is defined as, 

T^UiX,Po,Pi,yi) = XI T^LjX^Pi)-Vj+ Costi{po,Pi). (3) 

jech{i) 

Collaborative Edge: For any pair of leaf-nodes {Lj^Ljf) respectively associated with two dif¬ 
ferent or-nodes, we define the collaborative edge between them according to their contextual co¬ 
occurrence. That is, how likely it is that the object contains contours detected via the two leaf-nodes. 
The response of the pairwise potentials is parameterized as, 

n 

T^e{V)= X X (4) 

j=z-\-l j'eneigh{j) 

where neigh{j) is defined as the neighbor leaf-nodes from the other or-node adjacent (in spatial 
direction) to Lj, and V is a joint vector for each vp V = (vi,...,V; 2 ) = ^(j j') 

indicates the compatibility between Lj and L^/. 

Root-node: The root-node represents a global classifier to verify the ensemble of contour fragments 
= {ci,..., C;^} proposed by the or-nodes. The response of the root-node is parameterized as, 

7^T(C^) (5) 

where (j)'^{C'^) is the feature vector of and the corresponding parameter vector. 

Therefore, the overall response of the And-Or graph is: 

a 

TZGiX,P,V) = Y,'^uAX,Po,Pi,yi) +'IIe{V) +TZTiC'’) 

i=l 

z n 

= X[ X ■'<’]■ <f>Apo<Pi)]+ X X ■'^3 +‘^’' ■ (6) 

i=l jEch(i) j = j' Eneigh(j) 

where P = {po^pi, ...,pz) is a. vector of the positions of or-nodes. For better understanding, we 
refer H = (P, V) as the latent variables during inference, where P implies the deformation of 
parts represented by the or-nodes and V implies the discrete distribution of leaf-nodes (i.e., which 
leaf-nodes are activated for detection). The Eq.(6) can be further simplified as : 

nG{X,H)=uj-(t>{X,H), (7) 

where uj includes the complete parameters of And-Or graph, and H) is the feature vector, 

^ + •••? —^1? •••? ^fz + l,z + l+m): •••: ^fn-m,n) ? • (8) 

= {(t>{pi,C^+l) ,(t>{Pz,Cn) -Vn, 

4>AP0,Pi),--- ,4>AP0,Pz),Vz+ 1 ■ Vz+l+m,--,Vn-m (9) 
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Figure 1: Illustration of dynamical structure learning. Parts of the model, two or-nodes (f/i, Uq), are 
visualized in three intermediate steps, (a) The initial structure, i.e., the regular layout of an object. 
Two new structures are dynamically generated during iteration, (b) A leaf-node associated with Ui 
is removed, (c) A new leaf-node is created and assigned ioU^. 


4 Inference 

The inference task is to localize the optimal contour fragments within the detection window, which 
is slidden at all scales and positions of the edge map X. Assuming the root-node is located at po, 
the object shape is localized by maximizing 1Zg{X^ H) defined in (6): 

S{po,X) = max7^G(X,i^). (10) 

H 

The inference procedure integrates the bottom-up testing and top-down verification: 

Bottom-up testing: For each or-node Ui, its children leaf-nodes (i.e. the local classifiers) are uti¬ 
lized to detect contour fragments within the edge map X. Assume that leaf-node Lj,j G ch{i) 
associated with Ui is activated, Vj = 1, and the optimal contour fragment Cj is localized by maxi¬ 
mizing the response in Eq.(3), where the optimal location p* ^ is also determined. Then we generate 
a set of candidates for each or-node, {cj,p*^}, each of which is one detected contour fragments via 
the leaf-nodes. These sets of candidates will be passed to the top-down step where the leaf-node 
activation v^ for Ui can be further validated. We calculate the response for the bottom-up step, as, 

z 

'Tlbot{y) = y^,'^Ui{X,po,Pi ,\i), ( 11 ) 

i=l 

where V = {v^} denotes a hypothesis of leaf-node activation for all or-nodes. In practice, we can 
further prune the candidate contours by setting a threshold on TlbotiV). Thus, given the V = {v^}, 
we can select an ensemble of contours = {ci,..., C;^}, each of which is detected by an activated 
leaf-node, Lj,Vj = 1. 

Top-down verification: Given the ensemble of contours we then apply the global classifier 
at the root-node to verify by Eq. (5), as well as the accumulated pairwise potentials on the 
collaborative edges defined in Eq.(4). 

By incorporating the bottom-up and top-down steps, we obtain the response of And-Or graph model 
by Eq.(6). The final detection is acquired by selecting the maximum score in Eq.(lO). 

5 Discriminative Learning for And-Or Graph 

We formulate the learning of And-Or graph model as a joint optimization task for model struc¬ 
ture and parameters, which can be solved by an iterative method extended from the CCCP frame¬ 
work [22] . This algorithm iterates to determine the And-Or graph structure in a dynamical manner: 
given the inferred latent variables H = {P,V) in each step, the leaf-nodes can be automatically 
created or removed to generate a new structural configuration. To be specific, a new leaf-node is 
encouraged to be created as the local detector for contours that cannot be handled by the current 
model(Fig. 1(c)); a leaf-node is encourage to be removed if it has similar discriminative ability as 
other ones(Fig. 1(b)). We thus call this procedure dynamical CCCP (dCCCP). 

5.1 Optimization Formulation 

Suppose a set of positive and negative training samples (Xi, ),..., (A^r, ^at) are given, where X is 
the edge map, ^ = ±1 is the label to indicate positive and negative samples. We assume the samples 
indexed from 1 to iT are the positive samples, and the feature vector for each sample (X, y) as. 
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4>{X,H) 
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if y = +1 
ify = -l ’ 


where H is the latent variables. Thus, Eq.(lO) can be rewritten as a discriminative function, 


S^{X) = argmaXy^nii^ ■ HX, y, H)). (13) 

The optimization of this function can be solved by using structural SVM with latent variables, 

1 

min -||a;||^ • (t){Xk,y,H) + C{yk,y,H)) - max(a; • (14) 

U) z HtH H 

k = l 

where D is a. penalty parameter(set as 0.005 empirically), and C{yk^y^H) is the loss function. We 
define that C{yk^y^ H) = 0 if = y/'V if y^ 7 ^ ^ in our method. 

The optimization target in Equation(14) is non-convex. The CCCP framework [23] was recently 
utilized in [22, 25] to provide a local optimum solution by iteratively solving the latent variables 
H and the model parameter uj. However, the CCCP does not address the or-nodes in hierarchy, 
i.e., assuming the configuration of structure is fixed. In the following, we propose the dCCCP by 
embedding a structural reconfiguration step. 


5.2 Optimization with dynamic CCCP 

Eollowing the original CCCP framework, we convert the function in Eq. (14) into a convex and 
concave form as, 

N N 

min[-||cc;f + D max(cc; • (j){Xk,y,H) + C{yk,y,H))] - [D max(a; • (j){Xk,yk,H))] (15) 

k=l k=l 

= mm[f{uj) - g{uj)], (16) 

OJ 

where /(cc) represents the first two terms, and g{uj) represents the last term in (15). 

The original CCCP includes two iterative steps: (I) fixing the model parameters, estimate the la¬ 
tent variables H* for each positive samples; (II) compute the model parameters by the traditional 
structural SVM method. In our method, besides the inferred H*, we need to further determine 
the graph configuration, i.e. the production of leaf-nodes associated with or-nodes, to obtain the 
complete structure. Thus, we insert one step between two original ones to perform the structure 
reconfiguration. The three iterative steps are presented as follows. 

(I) Eor optimization, we first find a hyperplane qt to upper bound the concave part —g{uj) in Eq.(16), 

—g{^) < + (<^ — cjt) • (17) 

where ujt includes the model parameters obtained in the previous iteration. We construct qt by 
calculating the optimal latent variables = argmaxH{^t * VkjH)). Since (j){Xk^ Vk^H) = 
0 when y^ = —1, we only take the positive training samples into account during computation. Then 
the hyperplane is constructed as qt = —D "^^=1 ^{^kiVki 

(II) In this step, we adjust the model structure by reconfiguring the leaf-nodes. In our model, each 

leaf-node is mapped to several feature dimensions of the vector Thus, the process 

of reconfiguration is equivalent to reorganizing the feature vector 0(X, y, H*). Accordingly, the 
hyperplane qt would change with (/)(X, ilf*), and would lead to non-convergence of learning. 
Therefore, we operate on y^ iif*) guided by the Principal Component Analysis (PC A). That is, 
we allow the adjustment only with the non-principal components (dimensions) of y^ i^*), in 
terms of preserving the significant information of y^ H"") [ 8 ]. As a result, qt is assumed to be 
unaltered. This step of model reconfiguration can be then divided into two sub-steps. 

(i) Feature refactoring guided by PCA. Given (t){Xk,yk, H^) of all positive samples, we apply 
PCA on them, 

/c 

cP{Xk,yk,Hl)^u + Y,l3k,iei, (18) 

i=l 

where /C is the number of the eigenvectors, the eigenvector with its parameter We set X a 
large number so that | yk, H^) — {u-\- Ylf=i Pk.i^i) 1 12 < cr, V/c. Eor the jth bin of the feature 
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(C) (b) 

Figure 2: A toy example for structural clustering. We consider 4 samples, Xi,..., A 4 , for train¬ 
ing the structure of Ui. (a) shows the feature vectors 0 of the samples associated with Ui, and the 
intensity of the feature bin indicates the feature value. The red and green bounding boxes on the 
vectors indicate the non-principal features representing the detected contour fragments via two dif¬ 
ferent leaf-nodes, (b) illustrates the clustering performed with (j)'. The vector {(j)Q^ 08, ^ 9 ) of -^2 is 
grouped from the right cluster to the left one. (c) shows the adjusted feature vectors according to the 
clustering. Note that clustering would result in structural reconfiguration, as we discuss in the text. 
This figure is encouraged to be view in electronic version. 


vector, we consider it non-principal only if Ci^j < 5 and Uj < 5 for all and ix, (cr = 2.0, ^ = 0.001 
in experiments). 

For each or-node a set of detected contour fragments, {cj, c?,..., cf^}, are obtained with the 
given of all positive samples. The feature vectors for these contours that are generated by 
the leaf-nodes, cj),..., mapped to different parts of the complete feature 

vector, ^ 1 , ..., (t){XK^ Vk^ ^k)}- More specifically, once we select the jth bin for the 

all feature vectors it can be either principal or not in different vectors 0. For all feature vector 
we select the non-principal bins to form a new vector. We thus refactor the feature vectors of these 
contours as {0'(pj, cj),..., {pf ,cf)}. 

(ii) Structural reconfiguration by clustering. To trigger the structural reconfiguration, for each or- 
node Ui, wo perform the clustering for detected contour fragments represented by the newly formed 
feature vectors. We first group the contours detected by the same leaf-node into the same cluster 
as a temporary partition. Then the re-clustering is performed by applying the ISODATA algorithm 
and the Euclidean distance. And the close contours are grouped into the same cluster. According 
to the new partition, we can re-organize the feature vectors, i.e. represent the similar contour with 
the same bins in the complete feature vector 0. Please recall that the vector of one contour is part 
of (j). We present a toy example for illustration in Fig. 2. The selected feature vector (non-principal) 
(j)'{pl,c^) = {(j)Q, 08 : ^ 9 ) of X 2 is grouped from one cluster to another; by comparing (a) with (c) 
we can observe that (06: 08: 09 ) is moved to (0i, 03 , 04 ). 

With the re-organization of feature vectors, we can accordingly reconfigure the leaf-nodes corre¬ 
sponding to the clusters of contours. There are two typical states. 

• New leaf-nodes are created once more clusters are generated than previous. Their parame¬ 
ters can be learned based on the feature vectors of contours within the clusters. 

• One leaf-node is removed when the feature bins related to it are zero, which implies the 
contours detected by the leaf-node are grouped to another cluster. 

In practice, we constrain the extent of structural reconfiguration, i.e., only few leaf-nodes can be 
created or removed for each or-node per iteration. After the structural reconfiguration, we denote 
all the feature vectors 0(X/e, H^) are adjusted to 0^(X/c, Then the new hyperplane is 

generated as qf = -D YlU Vk, H^)- 

(III) Given the newly generated model structures represented by the feature vectors cj)^ {X^ , Pk : Hk ) ’ 
we can learn the model parameters by solving cc^+i = argmin^^[f {uj) ^ uj • q^]. By substituting 
—g{uj) with the upper bound hyperplane qf, the optimization task in Eq. (15) can be rewritten as, 

1 ^ 

min -||ct;||^ -h D^[max(ct; • 0(X/e,y,i4) + C{yk,y,H)) - uj • 0^(X/e, i^fe)]. (19) 

w Z y,H 

k = l 

This is a standard structural SVM problem, whose solution is presented as. 
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Figure 3: The trained And-Or graph model with the UlUC-People dataset, (a) visualizes the three 
layer model, where the images on the top imply the verification via the root-node, (b) exhibits the 
leaf-nodes associated with the or-nodes, f/i,..., f/g; a practical detection with the activated leaf- 
nodes are highlighted by red. (c) shows the average precisions (AP) results generated by the And-Or 
tree (AOT) model and the And-Or graph (AOG) model. 


CO* = dY, H), (20) 

k,y,H 

where A(j){Xk, y, H) = <p‘^iXk, yk, H^) - (t>{Xk, y, H). We calculate Q(* by maximizing the dual 
function: 

max E ak,y,HC{yk,y,H) - — E E ak,y,HOtk',y',H' ^(l){Xk,y, H)A(l){Xk',y\ H'). (21) 

k,y,H k,k' y,H,y',H' 

It is a dual problem in standard SVM, which can be solved by applying the cutting plane method [1] 
and Sequential Minimal Optimization [13]. Thus, we obtain the updated parameters cct+i, and 
continue the 3-step iteration until the function in Eq.(16) converges. 

5.3 Initialization 

At the beginning of learning, the And-Or graph model can be initialized as follows. For each training 
sample (whose contours have been extracted), we partition it into a regular layout of several blocks, 
each of which corresponds to one or-node. The contours fallen into the block are treated as the 
input for learning. Once there are more than two contours in one block, we select the one with 
largest length. Then the leaf-nodes are generated by clustering the selected contours without any 
constraints, and we can thus obtain the initial feature vector for each sample. 

6 Experiments 

We evaluate our method for object shape detection, using three benchmark datasets: the UIUC- 
People [17], the ETHZ-Shape [7] and the INRIA-Horse [7]. 

Implementation setting. We fix the number of or-nodes in the And-Or model as 8 for the UIUC- 
People dataset, and 6 in other experiments. The initial layout is a regular partition (e.g. 4 x 2 blocks 
for the UlUC-People dataset and 2 x 3 for others). There are at most m = 4 leaf-nodes for each 
or-node. Eor positive samples, we extract their clutter-free object contours; for negative samples, 
we compute their edge maps by using the Pb edge detector [12] with an edge link method. The 
convergence of our learning algorithm take 6^9 iterations. During detection, the edge maps of 
test images are extracted as for negative training samples, within which the object is searched at 6 
different scales, 2 per octave. Eor each contour as the input to the leaf-node, we sample 20 points 
and compute the Shape Context descriptor for each point; the descriptor is quantized with 6 polar 
angles and 2 radial bins. We adopt the testing criterion defined in the PASCAL VOC challenge: a 
detection is counted as correct if the intersection over union with the groundtruth is at least 50%. 

Experiment I. The UlUC-People dataset contains 593 images (346 for training, 247 for testing). 
Most of the images contain one person playing badminton. Pig. 3(b) shows the trained And-Or 
model(AOG) in that each of the 8 or-nodes associates with 2^4 leaf-nodes. To evaluate the benefit 
from the collaborative edges, we degenerate our model to the And-Or Tree (AOT) by removing the 
collaborative edges. As Pig. 3(c) illustrates, the average precisions (AP) of detection by applying 
AOG and AOT are 56.20%and 53.84% respectively. Then we compare our model with the state- 
of-the-art detectors in [18, 2, 4, 5], some of which used manually labeled models. Pollowing the 
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Accuracy Applelogos Bottles Giraffes Mugs Swans Average 


Our AOG 

0.680 

Our method 

0.910 

0.926 

0.803 

0.885 

0.968 

0.898 

Our AOT 

0.660 

Maetal. [10] 

0.881 

0.920 

0.756 

0.868 

0.959 

0.877 

Wang et al. [18] 

0.668 

Srinivasan et al. [16] 

0.845 

0.916 

0.787 

0.888 

0.922 

0.872 

Andriluka et al. [2] 

0.506 

Maji et al. [11] 

0.869 

0.724 

0.742 

0.806 

0.716 

0.771 

Felz et al. [5] 

0.486 

Felz et al. [5] 

0.891 

0.950 

0.608 

0.721 

0.391 

0.712 

Bourdev et al. [4] 

0.458 

Lu et al. [9] 

0.844 

0.641 

0.617 

0.643 

0.798 

0.709 


(a) (b) 

Table 1: (a) Comparisons of detection accuracies on the UlUC-People dataset, (b) Comparisons of 
average precision (AP) on the ETHZ-Shape dataset. 


metric mentioned in [18], to calculate the detection accuracy, we only consider the detection with 
the highest score on an image for all the methods. As Table, la reports, our methods outperforms 
other approaches. 



Figure 4: (a)Experimental results with the recall-FPPI measurement on the INRIA-Horse database. 
(b),(c) and (d) shows a few object shape detections by applying our method on the three datasets, 
and the false positives are annotated by blue frames. 


Experiment II. The INRIA-Horse dataset consists of 170 horse images and 170 images without 
horses. Among them, 50 positive examples and 80 negative examples are used for training and 
remaining 210 images for testing. Fig. 4 reports the plots of false positives per image (FPPI) vs. 
recall. It is shown that our system substantially outperforms the recent methods: the AOG and AOT 
models achieve detection rates of 89.6% and 88.0% at 1.0 FPPI, respectively; in contrast, the results 
of competing methods are: 87.3% in [21], 85.27% in [11], 80.77% in [7], and 73.75% in [6]. 

Experiment III. We test our method with more object categories on the ETHZ-Shape dataset: Ap¬ 
plelogos, Bottles, Giraffes, Mugs and Swans. For each category (including 32 ^ 87 images), half of 
the images are randomly selected as positive examples, and 70 ^ 90 negative examples are obtained 
from the other categories as well as backgrounds. The trained model for each category is tested 
on the remaining images. Table lb reports the results evaluated by the mean average precision. 
Compared with the current methods [11, 16, 5, 9, 10], our model achieves very competitive results. 

A few results are visualized in Fig.4(b),(c) and (d) for experiment I, II, and III respectively. 

7 Conclusion 

This paper proposes a discriminative contour-based object model with the And-Or graph represen¬ 
tation. This model can be trained in a dynamical manner that the model structure is automatically 
determined during iterations as well as the parameters. Our method achieves the state-of-art of 
object shape detection on challenging datasets. 
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